post-mortem: f24 boot fails; need help.

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

post-mortem: f24 boot fails; need help.

William Mattison-2
Good morning,

The "f24 boot fails; need help" problem set me back a week.  I'm still
catching up.  I seriously believe it would be foolish for me to just
forget it.  I should for the benefit of others try to get at the real
cause and possible prevention.

A few hours before the failure, I received and looked at an e-mail that
I'm almost certain was at least a spoof, and possibly malicious.  I know
it contained html and links.  I did *** not ***
click any of the links.  I looked at it, and deleted it.  It was viewed
in Thunderbird only.  The message's "From" ended with "yahoo.com".  My
question: It is highly improbable that that message had anything to do
with the boot failure.  Am I correct?

Also a few hours before the failure, I did some web browsing using
Firefox with NoScript and uBlock Origin.  As best as I recall, the
"riskiest" sites that I visited were finance.yahoo.com (and a few of its
sub-pages, I clicked no ads, no ad links) and indeed.com (possibly and a
posting or two).  My question: It is highly improbable that my web
browsing had anything to do with the boot failure.  Am I correct?

thanks,
Bill.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Sam Varshavchik
William writes:

> A few hours before the failure, I received and looked at an e-mail that I'm  
> almost certain was at least a spoof, and possibly malicious.  I know it  
> contained html and links.  I did *** not ***
> click any of the links.  I looked at it, and deleted it.  It was viewed in  
> Thunderbird only.  The message's "From" ended with "yahoo.com".  My  
> question: It is highly improbable that that message had anything to do with  
> the boot failure.  Am I correct?

You are correct.

> Also a few hours before the failure, I did some web browsing using Firefox  
> with NoScript and uBlock Origin.  As best as I recall, the "riskiest" sites  
> that I visited were finance.yahoo.com (and a few of its sub-pages, I clicked  
> no ads, no ad links) and indeed.com (possibly and a posting or two).  My  
> question: It is highly improbable that my web browsing had anything to do  
> with the boot failure.  Am I correct?

You are correct, again.


_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]

attachment0 (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Rick Stevens-4
In reply to this post by William Mattison-2
On 05/24/2017 08:38 AM, William wrote:

> Good morning,
>
> The "f24 boot fails; need help" problem set me back a week.  I'm still
> catching up.  I seriously believe it would be foolish for me to just
> forget it.  I should for the benefit of others try to get at the real
> cause and possible prevention.
>
> A few hours before the failure, I received and looked at an e-mail that
> I'm almost certain was at least a spoof, and possibly malicious.  I know
> it contained html and links.  I did *** not ***
> click any of the links.  I looked at it, and deleted it.  It was viewed
> in Thunderbird only.  The message's "From" ended with "yahoo.com".  My
> question: It is highly improbable that that message had anything to do
> with the boot failure.  Am I correct?
>
> Also a few hours before the failure, I did some web browsing using
> Firefox with NoScript and uBlock Origin.  As best as I recall, the
> "riskiest" sites that I visited were finance.yahoo.com (and a few of its
> sub-pages, I clicked no ads, no ad links) and indeed.com (possibly and a
> posting or two).  My question: It is highly improbable that my web
> browsing had anything to do with the boot failure.  Am I correct?

It is unlikely that you got infected by the email--especially if you
didn't click on any links and you have "Show remote content" turned
off in the mail client.

The browser is probably safe as well. I don't use Firefox (I use Chrome)
and I have Adblock, UBlock, UMatrix and PrivacyBadger enabled on it.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    [hidden email] -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
-           Vegetarian:  Old Indian word for "lousy hunter"          -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

William Mattison-2
In reply to this post by William Mattison-2
Thank-you Sam and Rick.

For the next 2 questions, I'm not looking for numerical answers.  Qualitative probability terms on a scale going from "highly improbably" to "almost certainly" would be great.

The clock (and the CMOS battery) got some attention while trying to fix the boot problem.  I have not yet replaced the battery, but I'm not seeing any problems.  What is the likelihood that the battery or the clock caused the boot failure?

The boot failure occurred right after doing my weekly "dnf upgrade".  What is the likelihood that the "dnf upgrade" (or one of the patches installed by it) caused the problem?

thanks,
Bill.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Joe Zeff-2
On 05/24/2017 09:20 PM, William Mattison wrote:
> The clock (and the CMOS battery) got some attention while trying to fix the boot problem.  I have not yet replaced the battery, but I'm not seeing any problems.  What is the likelihood that the battery or the clock caused the boot failure?

If the battery's weak enough to mess up the CMOS, it's possible.
However, long before that, your hardware clock will start to run slow.
(This is, actually a built in feature.  It's intended to let you know
that it's time to change the battery.)  If you can go into the CMOS
setup, before it tries to boot, see if everything looks right, and that
the clock is right.  If it's slow, turn things off without correcting
it, and try again in a few hours.  If it's farther behind, change the
battery and see if that helps.  I don't know if it's still true, but the
Print Screen key used to work there, and if so, you can use it to get a
printout of your settings to be used later if needed.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Rick Stevens-4
On 05/24/2017 11:40 PM, Joe Zeff wrote:

> On 05/24/2017 09:20 PM, William Mattison wrote:
>> The clock (and the CMOS battery) got some attention while trying to
>> fix the boot problem.  I have not yet replaced the battery, but I'm
>> not seeing any problems.  What is the likelihood that the battery or
>> the clock caused the boot failure?
>
> If the battery's weak enough to mess up the CMOS, it's possible.
> However, long before that, your hardware clock will start to run slow.
> (This is, actually a built in feature.  It's intended to let you know
> that it's time to change the battery.)  If you can go into the CMOS
> setup, before it tries to boot, see if everything looks right, and that
> the clock is right.  If it's slow, turn things off without correcting
> it, and try again in a few hours.  If it's farther behind, change the
> battery and see if that helps.  I don't know if it's still true, but the
> Print Screen key used to work there, and if so, you can use it to get a
> printout of your settings to be used later if needed.

I agree with Joe. I'd imagine the battery would only cause issues if the
BIOS got messed up somehow. A slow clock wouldn't necessarily cause a
boot issue--but you might get a lot of weird "file date is in the
future" errors caused by the OS looking at the clock (which is slow) and
comparing it against file dates (which were set using the correct time).
This can also cause strangeness with LDAP and Kerberos authentication or
Samba operations as they're time-sensitive.

Otherwise, with a weak battery the BIOS will usually revert to default
settings which are generally considered conservative and "safe". If it
only managed to partially set up the defaults, weird stuff can happen.
Note that if you had modified the boot order from the BIOS-based
defaults, then you'd usually see the boot stall at the device selection
point. If you set some other things (disk caches, memory timings, wait
states and other items which gamers tend to mess with) then it could
cause additional funkiness.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    [hidden email] -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
-    Admitting you have a problem is the first step toward getting   -
-    medicated for it.      -- Jim Evarts (http://www.TopFive.com)   -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Tim-163
On Thu, 2017-05-25 at 12:47 -0700, Rick Stevens wrote:
> Otherwise, with a weak battery the BIOS will usually revert to default
> settings which are generally considered conservative and "safe".

I'm not so sure that's the case.  In many PCs, the BIOS clock, BIOS
memory, and perhaps other BIOS hardware, are powered solely by the
battery (even when the computer is running off mains power).  So, with
failing power you could have all manner of random things happen.
Digital circuits don't work well when not fully powered.

If it had completely failed, then I might expect default settings to be
adopted at power up - assuming that the computer would power up with a
dead BIOS battery.

Though some BIOSs use an EEPROM as non-volatile memory, rather than just
low-power RAM with a battery to keep it working.  Making a loss of
settings very hard.  A friend of mine had a PC with a three-way switch
to decide which BIOS settings to use when booting up, and if I recall
correctly, two of them were EEPROM stored.  It was designed as a geeks
motherboard, you could use the feature to have turbo settings, stable
settings, experimental settings, and always be able to boot up by
flipping the switch if you'd changed something in a bad way.

If you believe your BIOS settings may have been scrambled, it may be a
good idea to select the reset to default options, save them, go back and
set any personal options, to force that all BIOS settings are reset.

I'm still not convinced with the cargo-cult idea that the BIOS clock is
actually designed to run slow, rather than that simply being a common
side-effect.  I've certainly had a motherboard where that effect did not
happen.


_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Tom Killian
In reply to this post by William Mattison-2
Some years ago I had an IBM ThinkPad that one day failed to boot, and every subsystem diagnostic that ran at power-up (keyboard, memory, disk controller, ...) reported a problem.  On a whim I put in a new clock battery and everything was fine.  Now any time a machine suddenly goes flakey, the clock battery is the first thing that gets replaced.

On Fri, May 26, 2017 at 21:22:17 +0930,Tim wrote:
On Thu, 2017-05-25 at 12:47 -0700, Rick Stevens wrote:
> Otherwise, with a weak battery the BIOS will usually revert to default
> settings which are generally considered conservative and "safe".

I'm not so sure that's the case.  In many PCs, the BIOS clock, BIOS
memory, and perhaps other BIOS hardware, are powered solely by the
battery (even when the computer is running off mains power).  So, with
failing power you could have all manner of random things happen.

_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Rick Stevens-4
On 05/26/2017 10:48 AM, Tom Killian wrote:
> Some years ago I had an IBM ThinkPad that one day failed to boot, and
> every subsystem diagnostic that ran at power-up (keyboard, memory, disk
> controller, ...) reported a problem.  On a whim I put in a new clock
> battery and everything was fine.  Now any time a machine suddenly goes
> flakey, the clock battery is the first thing that gets replaced.

That's one of the standard things I do during my yearly maintenance of
machines (shut down, pull cards, clean contacts, vacuum out dust and
other detrius, replace BIOS batteries, replace fans, put it all back
together, then go howl at a full moon and hope they boot up again).

>
> On Fri, May 26, 2017 at 21:22:17 +0930,Tim wrote:
>
>     On Thu, 2017-05-25 at 12:47 -0700, Rick Stevens wrote:
>     > Otherwise, with a weak battery the BIOS will usually revert to default
>     > settings which are generally considered conservative and "safe".
>
>     I'm not so sure that's the case.  In many PCs, the BIOS clock, BIOS
>     memory, and perhaps other BIOS hardware, are powered solely by the
>     battery (even when the computer is running off mains power).  So, with
>     failing power you could have all manner of random things happen.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    [hidden email] -
- AIM/Skype: therps2        ICQ: 226437340           Yahoo: origrps2 -
-                                                                    -
- ...Had this been an actual emergency, we would have fled in terror -
-                      and you'd be on your own, pal!                -
----------------------------------------------------------------------
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Joe Zeff-2
In reply to this post by Tim-163
On 05/26/2017 04:52 AM, Tim wrote:
> I'm still not convinced with the cargo-cult idea that the BIOS clock is
> actually designed to run slow, rather than that simply being a common
> side-effect.  I've certainly had a motherboard where that effect did not
> happen.

I've had several slow-clock issues over the decades solved by changing
the battery, and I've never had it not work.  Just because you don't
believe it doesn't make it "cargo-cult."
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

William Mattison-2
In reply to this post by William Mattison-2
Good evening,

Hardware problems have seriously tied me up for about a week now.  My apologies for my silence on this topic.  The hardware issue is not really fixed yet.  I likely will be forced off-line again for several days to a few weeks.  If I'm not responding; assume that that's what's happening.

The fix on Thursday, May 18 did not last.  This past Thursday, my workstation again failed to boot.  This time, it dropped me into an emergency shell, not the dracut shell.  This time, the log file was almost twice as long.  But it reported fsck failures again, this time on sda7 rather than sda6.  So I tried what my friend did, but with "/dev/sda7" instead of "/dev/sda6" as the command parameter.  I spent 30-45 minutes doing nothing but rapidly hitting the 'y' key before the command finally completed.  (Apparently, hundreds of i-nodes were corrupted this time.)  Then the workstation successfully booted.

I think I spent a week trying to get into BIOS.  But I wasn't seeing a BIOS screen before the grub menu showed up.  I think it was when I shut down and started up a different way that I finally saw the BIOS screen.  I quickly changed the time for the BIOS screen from 2 seconds to 8 seconds.  As suggested in this discussion, I checked the voltages and the clock.  The voltages looked fine.  The clock was about 5 seconds slow compared to my "atomic" clock.  I adjusted that.  This morning, the clock seemed barely noticeably slow compared to that atomic clock, but by less than a second.  So I'm agreeing with your suspicions that the battery is getting low.

This morning, I tried to replace the battery.  Most of the motherboard (ASUS Sabertooth Z77, bought in early 2013) is covered by a hard, dark gray plastic cover.  The battery should be under that, below the graphics card socket.  I could not find a way of getting that cover off.  Neither the user's guide nor the support dvd provided any clues.  The ASUS web site GUI for submitting a support request did not work.  Any ideas?

If I have to replace the motherboard, will I have to re-install Fedora and windows-7 (it's a dual-boot system)?

I find it odd that this problem:
* did not seem to affect windows-7 (yet?).
* happened only immediately after doing my weekly Fedora patches ("dnf upgrade").
* did not occur for a week between the first and second occurrences.
* would corrupt so many i-nodes the second time.

Once the battery gets low enough, I'll have no access to the internet or this list.  How can I get help if I need it?  My problems will  be beyond what my local IT friends can handle.

Thank-you for your help so far.
Bill.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Joe Zeff-2
On 05/29/2017 07:42 PM, William Mattison wrote:
> The clock was about 5 seconds slow compared to my "atomic" clock.  I adjusted that.  This morning, the clock seemed barely noticeably slow compared to that atomic clock, but by less than a second.  So I'm agreeing with your suspicions that the battery is getting low.

If your battery is getting low, it's just barely starting.  Usually,
when it becomes an issue, you see a change of minutes per day, not one
or two seconds.  Still, changing it can't hurt.  However, your hard disk
issues are making me wonder if either the disk or the controller aren't
at fault.  It's clearly a hardware issue to me, but there are still
several possibilities for just what's gone bad.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Tim-163
In reply to this post by William Mattison-2
On Tue, 2017-05-30 at 02:42 +0000, William Mattison wrote:

> The fix on Thursday, May 18 did not last.  This past Thursday, my
> workstation again failed to boot.  This time, it dropped me into an
> emergency shell, not the dracut shell.  This time, the log file was
> almost twice as long.  But it reported fsck failures again, this time
> on sda7 rather than sda6.  So I tried what my friend did, but with
> "/dev/sda7" instead of "/dev/sda6" as the command parameter.  I spent
> 30-45 minutes doing nothing but rapidly hitting the 'y' key before the
> command finally completed.  (Apparently, hundreds of i-nodes were
> corrupted this time.)  Then the workstation successfully booted.
>
> I think I spent a week trying to get into BIOS.  But I wasn't seeing a
> BIOS screen before the grub menu showed up.  I think it was when I
> shut down and started up a different way that I finally saw the BIOS
> screen.  I quickly changed the time for the BIOS screen from 2 seconds
> to 8 seconds.  As suggested in this discussion, I checked the voltages
> and the clock.  The voltages looked fine.  The clock was about 5
> seconds slow compared to my "atomic" clock.  I adjusted that.  This
> morning, the clock seemed barely noticeably slow compared to that
> atomic clock, but by less than a second.  So I'm agreeing with your
> suspicions that the battery is getting low.

Actually, I wouldn't call the BIOS clock being 5 seconds off much to
worry about (with regards to the battery).  They're not that
particularly accurate, to begin with, on a par with a cheap wristwatch.
However, if your battery is a few years old, you may as well replace it
now that you're in the mood to do so.  They do have a finite lifespan.

If the BIOS voltage monitors say the voltages are fine, they probably
are.  Though they're not always super accurate, either.  Software that
lets you read these values when the OS is running needs to modify them
with correction factors.

Since you talk about many file system errors, and difficulty booting,
I'm inclined to point the finger at the main power supply.  If it's not
up to the task of powering everything, or is randomly glitching, that
could cause all sorts of instabilities.

Though, as you're taking things apart.  It may well be a good idea to
unplug everything, and reconnect, just to exercise the connections
(cars, RAM, cables, etc).  Cards have a habit of walking out due to
thermal changes, or mechanical stress when moving a flimsy case around.
Clean any exposed slots (e.g. unused PCIe slots).

I just did this simple search, and there's even videos of how to change
the battery, right at the top.  Though I don't think much of one
person's crude "cut through the cover" technique.
https://www.google.com.au/search?q=ASUS+Sabertooth+Z77+bios+battery

This seems more sensible:
https://www.youtube.com/watch?v=aSTTR_WVtx0
long video, but he's done it by 4 minutes in.

Perhaps ASUS thinks that by the time the battery is crapping out, you'll
have reached the stage of wanting to buy a newer PC.

--
[tim@localhost ~]$ uname -rsvp
Linux 3.9.10-100.fc17.x86_64 #1 SMP Sun Jul 14 01:31:27 UTC 2013 x86_64
(always current details of the computer that I'm writing this email on)

Boilerplate:  All mail to my mailbox is automatically deleted, there is
no point trying to privately email me, I only get to see the messages
posted to the mailing list.

The weekly life-cycle of the electronics enthusiast: Monday: Get an
idea, and draft it out. Tuesday: Go and buy the parts. Wednesday: Solder
the components together. Thursday: Build the casing and install the
electronics. Friday: Start getting it to work and fine tuning. Saturday:
Neatly install the finished product and use it for an hour. Sunday:
Watch smoke escape when you turn it on, prepare shopping list for new
parts to buy, tomorrow.


_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

William Mattison-2
In reply to this post by William Mattison-2
I wasn't fully convinced these problems are due to the battery.  That's why I listed the four things I found "odd".  On the other hand, I recall hearing and reading that the output of lithium batteries is almost flat (better than any other type of battery), but then very quickly drops (faster than any other type of battery) as it reaches end-of-life.

Back to diagnosing the real cause of the problems...

Is there a Fedora command that I can use to check the hard drive (not the file systems) for bad blocks, sectors, tracks, etc?  Is there a Fedora command that I can use to check the controller?

Both problems occurred immediately after doing a "dnf upgrade".  What is that telling us?  Does "dnf upgrade" access the hard drive or the controller in a way that normal daily use does not?  Is there something different about the first boot after a "dnf upgrade" vs other boots?  I shut down every night, and boot up every morning.

When I bought the system 4+ years ago, I bought separate parts.  This is a DIY desktop.  I was advised to buy more power supply than needed.  I did so.  So unless the power supply is failing, I would think it's not a good candidate for the cause of the two problems.  There have been no problems until this month, and I've been doing weekly patches since I got the system in 2013.

I was/am not in the mood to change the battery!  Since I've already bought the new one and have no other use for it, and since the old one is 4+ years old, I plan to change the battery either Friday or Saturday.  But you know what they say: "If you want to make God laugh, tell Him your plans!".  I did watch the youtube that Tim provided.  I don't recall seeing screws on the underside of the motherboard.  I'll look again Friday or Saturday (God willing!).

Thanks,
Bill.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Rick Leir
Bill,

Power supplies can fail at any time, and they are less reliable than any
other parts in my PC's.

PC's are more reliable if you leave them on, configured to go into sleep
mode when left unused (this statement will spark a discussion).

Most spinning disk drives these days support smartd, smartctl.

http://www.linuxjournal.com/magazine/monitoring-hard-disks-smart

An exception would be hardware RAID (shown below), its manufacturer
would supply management tools.

With smartctl, expect the failure counts to be non-zero, all disks have
errors which get remapped. The error counts can be alarming even though
the disk is fine for normal use. Any sudden increase in errors and you
will want to save some good backups Right Soon Now.

Spinning disks often fail gradually over days or weeks. SSDs can
suddenly drop completely, with no remedy.

cheers -- Rick

--RAID--

# smartctl --all /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.11.3-300.fc26.x86_64]
(local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              LOGICAL VOLUME
Revision:             3.66
User Capacity:        1,200,186,941,440 bytes [1.20 TB]
Logical block size:   512 bytes
Rotation Rate:        15000 rpm
Logical Unit id:      0x600508b1001c4a3abee7e559b116e419
Serial number:        50014380145ECE10
Device type:          disk
Local Time is:        Wed May 31 01:58:57 2017 CDT
SMART support is:     Unavailable - device lacks SMART capability.


--SSD--


root@lite:~# smartctl --all /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-20-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 EVO 250GB
Serial Number:    S21NNS999999
..
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
..

--SATA--
# smartctl --all /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-20-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD103SJ
Serial Number:    S2QPJ9KB99999
..
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
..

----

On 2017-05-31 12:09 AM, William Mattison wrote:

> I wasn't fully convinced these problems are due to the battery.  That's why I listed the four things I found "odd".  On the other hand, I recall hearing and reading that the output of lithium batteries is almost flat (better than any other type of battery), but then very quickly drops (faster than any other type of battery) as it reaches end-of-life.
>
> Back to diagnosing the real cause of the problems...
>
> Is there a Fedora command that I can use to check the hard drive (not the file systems) for bad blocks, sectors, tracks, etc?  Is there a Fedora command that I can use to check the controller?
>
> Both problems occurred immediately after doing a "dnf upgrade".  What is that telling us?  Does "dnf upgrade" access the hard drive or the controller in a way that normal daily use does not?  Is there something different about the first boot after a "dnf upgrade" vs other boots?  I shut down every night, and boot up every morning.
>
> When I bought the system 4+ years ago, I bought separate parts.  This is a DIY desktop.  I was advised to buy more power supply than needed.  I did so.  So unless the power supply is failing, I would think it's not a good candidate for the cause of the two problems.  There have been no problems until this month, and I've been doing weekly patches since I got the system in 2013.
>
> I was/am not in the mood to change the battery!  Since I've already bought the new one and have no other use for it, and since the old one is 4+ years old, I plan to change the battery either Friday or Saturday.  But you know what they say: "If you want to make God laugh, tell Him your plans!".  I did watch the youtube that Tim provided.  I don't recall seeing screws on the underside of the motherboard.  I'll look again Friday or Saturday (God willing!).
>
> Thanks,
> Bill.
> _______________________________________________
> users mailing list -- [hidden email]
> To unsubscribe send an email to [hidden email]
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Joseph Loo-2
In reply to this post by William Mattison-2
On 05/30/2017 09:09 PM, William Mattison wrote:

> I wasn't fully convinced these problems are due to the battery.  That's why I listed the four things I found "odd".  On the other hand, I recall hearing and reading that the output of lithium batteries is almost flat (better than any other type of battery), but then very quickly drops (faster than any other type of battery) as it reaches end-of-life.
>
> Back to diagnosing the real cause of the problems...
>
> Is there a Fedora command that I can use to check the hard drive (not the file systems) for bad blocks, sectors, tracks, etc?  Is there a Fedora command that I can use to check the controller?
>
> Both problems occurred immediately after doing a "dnf upgrade".  What is that telling us?  Does "dnf upgrade" access the hard drive or the controller in a way that normal daily use does not?  Is there something different about the first boot after a "dnf upgrade" vs other boots?  I shut down every night, and boot up every morning.
>
> When I bought the system 4+ years ago, I bought separate parts.  This is a DIY desktop.  I was advised to buy more power supply than needed.  I did so.  So unless the power supply is failing, I would think it's not a good candidate for the cause of the two problems.  There have been no problems until this month, and I've been doing weekly patches since I got the system in 2013.
>
> I was/am not in the mood to change the battery!  Since I've already bought the new one and have no other use for it, and since the old one is 4+ years old, I plan to change the battery either Friday or Saturday.  But you know what they say: "If you want to make God laugh, tell Him your plans!".  I did watch the youtube that Tim provided.  I don't recall seeing screws on the underside of the motherboard.  I'll look again Friday or Saturday (God willing!).
>
> Thanks,
> Bill.
> _______________________________________________
> users mailing list -- [hidden email]
> To unsubscribe send an email to [hidden email]
>
Have you tried badblocks? If you are not careful it will wipe your disk
completely. This will do a sector by sector scan.



--
Joseph Loo
[hidden email]
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Tim-163
In reply to this post by William Mattison-2
Allegedly, on or about 31 May 2017, William Mattison sent:
> I recall hearing and reading that the output of lithium batteries is
> almost flat (better than any other type of battery), but then very
> quickly drops (faster than any other type of battery) as it reaches
> end-of-life.

I can't say that I'm familiar with their discharge pattern, but I have
read that an in-use lifespan of three years is considered normal.  So,
you're at the time it might be worth replacing, even if it's not the
cause of current problems.  At the very least, you stop this being a
potential problem in another year or so.

> Is there a Fedora command that I can use to check the hard drive (not
> the file systems) for bad blocks, sectors, tracks, etc?  Is there a
> Fedora command that I can use to check the controller?

Look up S.M.A.R.T., though be aware that some controllers may not
co-operate, but that tends to be things like outboard USB interfaces, or
RAID.  Ordinary hard drives plugged straight into the motherboard are
likely to be checkable.  It's the hard drive, itself, that checks its
health and produces the stats, smartctl just gives you an interface.

> Both problems occurred immediately after doing a "dnf upgrade".  What
> is that telling us?

That you ought to try rebooting using a previous kernel, and see if
problems persist.

There are two red flags about problems after doing an update:

1. That a new kernel has changed hardware drivers, or created other
incompatibilities.

2. That your hard drive had some bad spots that hadn't been used before,
but as you filled it up with more files (the recent downloads and
installs), you hit the problem area.

Those are the two things that immediately jump to mind.

Yes, an update can be more stressful than other PC activities, for
*some* users.  But for other users, they're always subjecting their PC
to a heavy workload, so a prolonged update session is nothing different
from normal use.

> Does "dnf upgrade" access the hard drive or the controller in a way
> that normal daily use does not?

I would say not.  It's just files in and out, under the control of some
program, onto storage system in the usual way.

> When I bought the system 4+ years ago, I bought separate parts.  This
> is a DIY desktop.  I was advised to buy more power supply than needed.
> I did so.  So unless the power supply is failing, I would think it's
> not a good candidate for the cause of the two problems.  There have
> been no problems until this month, and I've been doing weekly patches
> since I got the system in 2013.

Power supplies do fail, sometimes gradually, sometimes spontaneously
combusting, sometimes just randomly glitching.  It can be complete
coincidence that some technical failure happens at the same time as you
did something you considered more special than it merely sitting there.

I agree with the concept of getting bigger than you think you need, but
it's hard to work out the criteria.  Few devices specify their power
requirements, at all, or specify them adequately.  i.e. A graphics card
may say it needs a 100 watt power supply.  That claim may be bogus, they
may be overestimating so you buy an adequate one, it may be accurate.
It doesn't specify how many watts it requires from the different
supplies in your PC (12 volt, 5 volt, 3.3 volt, etc).  So it could
require a lot from a 12 volt supply, less from the 5 volt, and your
power supply could be inadequate in one of those areas.

Then there's the power supply specs.  Do they list the power it can
continuously supply, the momentary higher peaks that it can supply?  And
there's a similar thing with the devices, does a graphic card's power
supply requirements specify continuous and momentary peaks.

The momentary peaks, as something suddenly needs more power, as it turns
on, or changes modes, etc., can be the kind of thing that cause enough
trouble to make a system unstable.

If you have a simple system, e.g. motherboard, graphics card, hard
drive, optical drive, it's not too hard to ensure you put in a
sufficiently beefy supply.  If you have a PC loaded with gadgets, it's
harder to estimate the requirements.

But what type of power supply did you put in?  Did you match the wattage
your supplier said you needed, did you overcompensate by an extra 100
watts?  Did you get some generic Chinese thing, or something that had a
reputation?

As an opposing example:  I stripped apart a friend's Mac, it has a
ridiculously beefy power supply, with large fat bus bars that bolt to
the motherboard, rather than those multi-pin molex connectors you see on
the average PC.  And that system is designed as a whole, so the
manufacturer ought to know the full system specs, as opposed to a PC
assembled from multiple different vendors who never collaborated.

> I did watch the youtube that Tim provided.  I don't recall seeing
> screws on the underside of the motherboard.  I'll look again Friday or
> Saturday (God willing!).

I could see them on one of the videos, quite small silver ones,
underneath the motherboard (you had to completely remove the board).
But maybe they've switched to black ones, that need careful inspection
to find.

I agree with the comments that ASUS made a prize design goof by burying
the CMOS battery with that plating.  I understand the value of covering
the whole board (forcing cooling across it, making it harder for
accidentally dropped things to land on exposed conductors, etc), but
they should have left a way to easily access the battery.

--
[tim@localhost ~]$ uname -rsvp
Linux 3.9.10-100.fc17.x86_64 #1 SMP Sun Jul 14 01:31:27 UTC 2013 x86_64
(always current details of the computer that I'm writing this email on)

Boilerplate:  All mail to my mailbox is automatically deleted, there is
no point trying to privately email me, I only get to see the messages
posted to the mailing list.

Long ago I gave up on using Windows (TM) [Tantrum Machine], and I've
never regretted it.


_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

William Mattison-2
In reply to this post by Rick Leir
I did "smartctl --all /dev/sda > smartctl_out.txt".  I got over 200 lines of output.  The most recent error reported in the output file is this one:

===============
Error 66 occurred at disk power-on lifetime: 13741 hours (572 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      00:05:43.747  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00      00:05:43.746  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 00      00:05:43.746  FLUSH CACHE EXT
  ef 10 02 00 00 00 a0 00      00:05:43.746  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:05:43.745  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
===============

I can't really make heads or tails of this.  I also notice in my system e-mail these 2 messages, bot on Thursday, May 25:
(1st message)
Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors
(2nd message, 1 minute later)
Device: /dev/sda [SAT], 8 Offline uncorrectable sectors

I also tried "smartctl -t short /dev/sda", followed later by "smartctl -l selftest /dev/sda".  The result:
===============
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13813         -
===============
If I understand the "-all" output correctly, the "-long" test would take about 4 hours, so I'm not trying that until later this week.

What else from the "smartctl" output should I post here?
What other "smartctl" functionality should I try or use?

Thank-you.
Bill.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

William Mattison-2
In reply to this post by Tim-163

> Look up S.M.A.R.T., though be aware that some controllers may not
> co-operate, but that tends to be things like outboard USB interfaces, or
> RAID.  Ordinary hard drives plugged straight into the motherboard are
> likely to be checkable.  It's the hard drive, itself, that checks its
> health and produces the stats, smartctl just gives you an interface.

Please see my reply to Rick.

> That you ought to try rebooting using a previous kernel, and see if
> problems persist.

I did, and the problem showed up with all three of the latest f24 versions available in the grub menu.

> Yes, an update can be more stressful than other PC activities, for
> *some* users.  But for other users, they're always subjecting their PC
> to a heavy workload, so a prolonged update session is nothing different
> from normal use.

I don't understand what you're saying here.  Both weekly patches went very quickly (I wish windows-7 were like that!) and with no errors reported in the output.

> But what type of power supply did you put in?  Did you match the wattage
> your supplier said you needed, did you overcompensate by an extra 100
> watts?  Did you get some generic Chinese thing, or something that had a
> reputation?

I did not figure out that part for myself.  I got advice from a friend with decades of experience working for IBM's high performance division, and then for Cray research.  The power supply is a Thermaltake TR2 600W.  The system also has a Core i7-3770K @ 3.5GHz x 8, 16 GB memory, GeForce GTX 660 graphics card, an ASUS Xonar Essence STX audio card, a 2 TB hard drive, 2 blu-ray drives, keyboard, trackball, web cam (rarely plugged in), two 27-inch Dell monitors, and 2 small speakers.  It's no gaming system, but a rather high-powered programming workstation by 2013 standards.

Thank-you,
Bill.
_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: post-mortem: f24 boot fails; need help.

Tim-163
Tim:
>> Yes, an update can be more stressful than other PC activities, for
>> *some* users.  But for other users, they're always subjecting their
>> PC to a heavy workload, so a prolonged update session is nothing
>> different from normal use.

William Mattison:
> I don't understand what you're saying here.  Both weekly patches went
> very quickly (I wish windows-7 were like that!) and with no errors
> reported in the output.

You mentioned that the problems happened straight after doing a dnf
update, and wondered if *that* process could have been the cause of the
problems.  I was pointing out that an update is no different than any
other medium-duty processing the computer might do (a bit of heavy
thinking when it processes dependencies, idling along as new files get
downloaded, a bit of slighly heavy thinking as the packages are
decompressed for a few moments before they get saved to disc).

>> But what type of power supply did you put in?  Did you...

> I did not figure out that part for myself.  I got advice from a friend
> with decades of experience working for IBM's high performance
> division, and then for Cray research.  The power supply is a
> Thermaltake TR2 600W.  The system also has a Core i7-3770K @ 3.5GHz x
> 8, 16 GB memory, GeForce GTX 660 graphics card, an ASUS Xonar Essence
> STX audio card, a 2 TB hard drive, 2 blu-ray drives, keyboard,
> trackball, web cam (rarely plugged in), two 27-inch Dell monitors, and
> 2 small speakers.  It's no gaming system, but a rather high-powered
> programming workstation by 2013 standards.

I would have thought 600 watts is more than sufficient for a general PC.
If you look at what gamers do to their boxes (with their high end
graphics cards and virtually a CPU farm in a box), it's staggering the
amount of power that some PCs (allegedly) use.  I'm sure they don't
really use all that, but the short term peaks as things fire up, change
modes, etc., can be a heck of a lot higher than their nominal power
usage - those transients can trip up cheap and nasty supplies.

Thermaltake TR2 600W specs
Maximum output capability 600 watts (no surprise, considering the model
name, and I think they've got a good reputation).

ASUS Sabertooth Z77
Looks nice, but I see no power specs on their site.  Though I see a
review of that board with your processor that suggests up to 183 watts
normally, add another 100 watts if overclocked.

GeForce GTX 660 specs
Maximum power used by the card 140 watts
Minimum system power supply recommendation 450 watts

Hmm, yeah, love their thinking there.  Well, I supposed they're making
an estimation of the likely power requirement of the rest of your
system.

ASUS Xonar Essence STX
Looks nice, a card without those wonky 3.5 mm jacks, and designed for
sound quality.  The kind of thing I might have gone for if I were buying
new parts.  No power specs, but I wouldn't think it's a major power hog.

Hard drives under 10 watts
Blu-ray drives about 30 watts

Yes, sounds like a 600 watt supply should be fine.  And your friend
obviously has the background to figure that out, too.

So, if it's a power problem, that might be down to a fault rather than
being an insufficient supply, in general.

Noting your other messages about hard drive errors, it may be that the
drive itself is failing.  Unrecoverable errors doesn't sound good, and
have never bode well for the couple of drives I had with them.  Though
some people say that they can carry on using a drive with such bad
sectors, if there's not many of them, and they're not increasing.
Faults with "unrecoverable, uncorrectable, unreadable" types of errors
are a big red flag.

The simple test is to try and write to the entire drive (which is
easiest to do when wiping the entire contents, rather than filling up
the space of an in-use drive), and see if that changes the error
condition.

Such as, if it couldn't read the contents of something that was an
interrupted write (such as a system crash, power failure, etc), on an
undamaged portion of the drive, but could wipe and re-use that bit,
suggests the drive will be okay.  The error being caused externally.

But if it can't write and read those sectors, with a fresh attempt, that
points the finger at the drive being at fault.

There are long and short SMART self tests that do these kinds of things.
If you can afford to wipe the drive and test it, that may be the best
way forward.  If you go to your drive's manufacturer's site, they
probably have a self-booting disc image to burn to test your drive
(Seagate and Western Digital, at least, used to when I've done this in
the dim and distant past).

If you can't do that, my suggestion is to buy a new hard drive, install
a fresh OS onto it, and test drive your PC for a week or so.

--
[tim@localhost ~]$ uname -rsvp
Linux 3.9.10-100.fc17.x86_64 #1 SMP Sun Jul 14 01:31:27 UTC 2013 x86_64
(always current details of the computer that I'm writing this email on)

Boilerplate:  All mail to my mailbox is automatically deleted, there is
no point trying to privately email me, I only get to see the messages
posted to the mailing list.

The mindset of software designers: You know that feature that you, and
many thousands of other users, found useful? We removed it, because we
didn't like it. We also hard-coded the default settings that you keep
customising.


_______________________________________________
users mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
12